Back

Genetics Selection Evolution

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match Genetics Selection Evolution's content profile, based on 33 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
kinference: Pairwise kinship detection for Close-Kin Mark-Recapture

Bravington, M. V.; Baylis, S. M.; Eveson, P.; Feutry, P.

2026-05-21 genetics 10.64898/2026.05.18.725841 medRxiv
Top 0.1%
4.8%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWClose-Kin Mark-Recapture (CKMR) is a statistical framework for estimating demographic parameters of wild populations. Instead of recapturing individuals, it relies on the identification of closely-related pairs such as parents and offspring, or siblings. By measuring how often such close-kin are "recaptured" among sampled animals (whether alive or dead), scientists can estimate demographic parameters such as census size, mortality rates, and connectivity. CKMR is starting to change fisheries and wildlife management by giving more reliable demographic information, even for many species that resist conventional approaches. Here we introduce the kinference R package, which provides a set of tools for finding close-kin pairs among thousands of samples each genotyped at thousands of SNPs, and for associated quality control. The CKMR context implies different requirements and assumptions to many other kinship programs. In particular, kinference accounts empirically for linkage without requiring a genome assembly, is able to estimate and control false-negative and false-positive probabilities, and can cope with null alleles. The package has been developed and used in numerous CKMR projects since 2017. This paper documents the assumptions, statistical algorithms, and intended workflow for kinference.

2
Temporal changes in allele frequency facilitate detection of adaptive variants in winter wheat (Triticum aestivum L.) breeding programs

Johansen, N. H.; Sarup, P.; Hansen, P.; Orabi, J.; Jahoor, A.; Ramstein, G. P.

2026-05-04 genetics 10.64898/2026.04.30.721918 medRxiv
Top 0.1%
4.4%
Show abstract

In quantitative genetics, candidate SNPs are identified through genotype-phenotype associations inferred with genome-wide association studies (GWAS). In this study, we explore an alternative approach to detect genetic variants with non-neutral effects by tracking temporal trends in allele frequency in a winter wheat (Triticum aestivum L.) breeding population over an eight-year period, from which signals of selection may be inferred. Selection signatures were inferred with a generalized linear model, where we modeled trends in allele frequency as a function of time (crossing year). These signatures of selection were used to prioritize variants. Associations between phenotypic performance and individual load of prioritized variants were then investigated. Furthermore, we assessed whether incorporating selection information into a genomic best linear unbiased prediction (GBLUP) model improves model performance in terms of quality of fit and prediction ability. Our findings indicate that the inferred signals of selection are effective in identifying non-neutral variants. Variants under strong negative selection were associated with a decrease in protein content adjusted for grain yield (p-value < 0.01), while genetic variants that had been under moderate to high levels of positive selection were associated with increased grain yield (p-value < 0.01). However, incorporating selection information did not improve prediction accuracy. In conclusion, temporal trends in allele frequency can be used to detect non-neutral variants. The proposed approach may hence complement traditional quantitative genetic methods for detecting non-neutral genetic variation. This approach may allow breeders to detect non-neutral variants earlier in the breeding cycle, without resorting to phenotypic data.

3
Increasing Phenomic Prediction Efficiency Using A Principal Component Analysis Based Pre-Processing Of Near Infrared Spectra

Bienvenu, C.; Roger, J.-M.; Sene, M.; Castro Pacheco, S. A.; Singer, M.; Felaniaina, B. L.; Terrier, N.; De Bellis, F.; Pot, D.; DE VERDAL, H.; Segura, V.

2026-05-13 genetics 10.64898/2026.05.10.724118 medRxiv
Top 0.1%
0.9%
Show abstract

Phenomic prediction (PP) is a breeding value prediction method using near infrared spectroscopy (NIRS). Spectra pre-processing is a key step in the analysis pipeline of PP and generally involves chemometrics methods. However, there is still little understanding in the genetics community of what pre-processing does and why it increases performances. Consequently, the choice of pre-processing is done either arbitrarily or through a search of the optimal set of methods and associated parameters. In this study, we propose a PCA-based pre-processing method where genetic values of spectra are estimated on a set of principal components instead of individual wavelengths. This way, estimations are based on a few informative and orthogonal features of spectra instead of many correlated, uninformative wavelengths. We tested this new pre-processing method on five data sets representing four plant species (maize, rice, sorghum and grapevine). Results show that it performs as good, or better than the best classical chemometric pre-processing methods in almost all cases. Combining PCA-based and classical chemometric pre-processing methods maximizes predictive ability. Moreover, this pre-processing method opens up possibilities of better understanding and selecting parts of the spectral information that are relevant for the prediction of breeding values. Indeed, components representing together about 1% of spectral variability were found to be responsible for most of PP predictive ability. Plain language summaryCultivated plants are the result of a breeding process during which their genetic values are used to select those to breed. Estimation of breeding values requires heavy experimental means and is time consuming. Phenomic prediction is a low cost and high throughput genetic value estimation method that is increasingly being used. It often uses near infrared spectroscopy measurements as predictors of genetic values that are easy to collect and thus routinely used in many species. However, near infrared spectra generally require pre-processing before being used in prediction. Currently used pre-processing methods arise from the chemometrics community, and still deserve a better in-depth appropriation by geneticists. In this study, we propose a new pre-processing approach that performs as good as or better than the best chemometric pre-processing generally used, reduces computation time, and allows for a better understanding of what parts of spectral information are relevant for prediction. Core IdeasO_LIWorking on principal components of spectra instead of wavelengths increases predictive ability of phenomic prediction and performs as good as or better than classical chemometrics pre-processing C_LIO_LIWorking on principal components of spectra requires less optimization of parameters than chemometrics pre-processing C_LIO_LIAbout 1% of spectral variance is responsible for most of the predictive power of phenomic prediction C_LIO_LIWorking on principal components of spectra pre-processed with classical chemometrics pre-processing can increase predictive ability even more C_LIO_LIPCA-based methods are valuable to optimize predictive ability of phenomic prediction and could be used more widely in the quantitative genetics field C_LI

4
Progeny differentiation in faba bean using hyperspectral images and machine learning

Schlichtermann, R.-H.; Warnemuende, S.; Tietgen, H.; Welna, G.; Stahl, A.; Wittkop, B.; Snowdon, R.

2026-05-21 genetics 10.64898/2026.05.19.725957 medRxiv
Top 0.2%
0.7%
Show abstract

Though currently a minor crop, faba bean is a promising source of plant-based protein as global diets shift towards more plant-based nutrition. To realise this potential, advances in breeding and cultivation are crucial. To exploit heterosis, faba bean breeding frequently utilises synthetic cultivars, which involves open pollination of inbred lines to produce a mixture of F1 hybrid seeds and self-pollinated offspring. Pure F1 hybrid cultivars are currently unavailable due to unstable cytoplasmic male sterility (CMS) systems. An ability to distinguish F1 seeds from their parental inbreds via characteristics associated with xenia effects could change this. The xenia effect refers to the influence of paternal pollen on seed traits, for example seed weight and cotyledon cells in faba bean. In this study, we exploited the xenia effect captured in hyperspectral imaging data to develop machine learning scenarios for discriminating between parental and F1 seeds of open pollinated synthetic combinations (Syn-1). The hyperspectral data were pre-processed using Savitzky-Golay filtering to reduce noise and smooth the spectra. Various machine learning algorithms were applied, incorporating Bayesian hyperparameter optimisation. The scenarios achieved up to 98.9 % accuracy in separating parental components of Syn-1. When including all seeds, the model achieved 40.7 %, indicating moderate detection and classification performance. As the harmonic mean of precision and recall, the F1 score accounts for both the correctness of F1 seed detections and the completeness with which F1 seeds were detected. While this approach does not yet enable the development of full hybrid cultivars, it paves the way for hybrid-enriched cultivars. These could help to streamline breeding for synthetic cultivars and potentially increase yields, for example by increasing the proportion of F1 hybrid seeds in synthetic cultivars. This study extends knowledge of the xenia effect in faba bean and provides a basis for further research aimed at enhancing breeding methods and productivity.

5
New chromosome-level haplotyped genome assemblies and annotation for the Japanese Quail (Coturnix Japonica)

Cabau, C.; Degalez, F.; Leroux, S.; Gourichon, D.; Serre, R.-F.; Vernette, C.; Donnadieu, C.; Iampietro, C.; Vandecasteele, C.; Pitel, F.; Klopp, C.

2026-05-14 genomics 10.64898/2026.05.12.724545 medRxiv
Top 0.2%
0.7%
Show abstract

The Japanese quail (Coturnix japonica) is a widely used model organism in developmental biology, genetics, and agriculture. Here, we present new, haplotyped, high-quality genome assemblies of the Japanese quail, generated using a combination of state-of-the-art sequencing technologies, including PacBio HiFi long reads, Oxford Nanopore sequencing, and Hi-C scaffolding. This assembly has a total length of 1.19 Gb, 80% of which is included in chromosomes, and is highly complete (BUSCO score aves_odb10: 97.3). Assembly metrics show a marked improvement in contiguity, with a significantly higher scaffold N50 and a lower number of contigs compared to the reference genome assembly. Remarkably, the assembly extends previously truncated chromosome ends, with 31 telomeres detected. In addition, we merged the existing Ensembl and Refseq annotations and obtained a combined set of 26,102 genes, of which 25,038 genes were successfully mapped on the improved assembly haplotype 1 (Cjap1.hap1). Together, these new genome assemblies and their enriched annotation provide a robust genomic framework for future research. They enhance our ability to investigate developmental processes, genetic and epigenetic inheritance, and host-pathogen interactions. Furthermore, they offer valuable insights for conservation genetics and sustainable breeding programs. This resource represents a critical step forward in leveraging the full potential of the Japanese quail as a model species in both basic and applied research.

6
Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits

Li, Z.; Li, X.; Liu, S.; Wilson, I.; Zhu, Q.-H.; Stiller, W.; Conaty, W.

2026-05-06 plant biology 10.64898/2026.05.03.722538 medRxiv
Top 0.3%
0.5%
Show abstract

Genomic prediction (GP) across diverse environments has a potential to accelerate genetic gain in cotton breeding programs. A major challenge in GP is modelling genotype-by-environment interactions (GEI), which is essential for selecting stable and high-performing genotypes under variable production conditions. However, incorporating GEI into GP models increases the dimensionality and computational complexity, risking complex models that are impractical to use on commercial breeding-scale data sets because of run times and computational demands. This study addresses two primary aims. Firstly, we evaluate the practical benefits of GEI-informed GP for predicting economically important cotton traits. Second, advanced statistical modelling strategies are developed and assessed for integrating genomic and environmental data at scale. We propose a dimensionality reduction approach that combines linkage disequilibrium network analysis with principal component techniques to reduce redundancy while preserving informative variation. Using this reduced dataset, we implement Bayesian linear regression models and, for comparison, deep residual neural networks for genomic prediction. Analyses were conducted on a large multi-environment dataset from the CSIRO cotton breeding program, comprising 3,236 breeding lines, 54 environmental covariates, and 8,049 yield and fibre quality phenotype records collected over 10 years and 9 locations representing 41 year-location combinations. Results demonstrate that generally Bayesian linear regression approaches outperform BG-BLUP models, with all three linear/linear mixed methods providing clearly more reliable performance than the deep learning models. These findings highlight the value of using interpretable statistical models for integrating genomic and environmental information to support selection decisions under diverse environmental conditions.

7
Chromosome-level genome assembly and annotation of the threatened marbled teal (Marmaronetta angustirostris)

Ortego, J.; Lopez-Luque, R.; Backstrom, N.; Green, A. J.

2026-05-14 genomics 10.64898/2026.05.12.723956 medRxiv
Top 0.3%
0.5%
Show abstract

The marbled teal (Marmaronetta angustirostris) is a widely distributed but declining waterfowl species, classified as Near Threatened globally and Critically Endangered in Spain. Despite ongoing conservation actions, including ex situ management and population reinforcement programmes, the genomic consequences of long-term captivity, inbreeding, and patterns of functional genetic variation remain unknown due to the absence of a species-specific reference genome. Here, we present the first chromosome-level genome assembly for this species. The genome was generated using PacBio HiFi long reads and Omni-C data, yielding a 1.15Gb assembly with a scaffold N50 of 76.95Mb. A total of 97.16% of the assembly was anchored into 36 chromosome-scale scaffolds, including the Z and W sex chromosomes. BUSCO analysis recovered 99.2% of conserved avian genes. Gene prediction was performed using both ab initio and homology-based strategies, resulting in 16,048 protein-coding genes. This resource provides a foundation for genomewide analyses of inbreeding, demographic history, and adaptive variation, and will support evidencebased in situ and ex situ conservation strategies for this threatened species.

8
A new method based on genome alignments provides a highly resolutive target enrichment set for weevils (Coleoptera, Curculionoidea)

ZELVELDER, B.; BENOIT, L.; LOISEAU, A.; HARAN, J.; ALLIO, R.

2026-05-13 evolutionary biology 10.64898/2026.05.09.724036 medRxiv
Top 0.3%
0.4%
Show abstract

Target enrichment methods have provided unprecedented advances in phylogenomics. Targeting hundreds of conserved regions has proven to be a good tradeoff between cost and efficiency, while being useful for museomics and diversified non-model clades. Unfortunately, current methods used for identifying such regions involve high degrees of conservation within targeted elements, usually pushing researchers to rely on flanking data with little guarantee for homology. With a growing number of high quality genomes available throughout the Tree of Life emerges new opportunities to improve marker selection. In this study, we introduce GABBI, a new method for designing target capture probes by taking advantage of genome alignments, avoiding the selection of a single reference genome that can cause notable biases. We compare GABBI-derived markers to the most commonly used probe design method, PHYLUCE, at two taxonomic scales, the weevil superfamily Curculionoidea and the tribe Pachyrhynchini. At both taxonomic scales, results show that our new method allows identifying more variable loci that prove to be more phylogenetically resolutive than the PHYLUCE-derived ones. Doing so, we provide the first probe set specifically designed for weevils, targeting a wide set of 4,255 shared homologous regions, encouraging future research on systematics and macroevolution of one of the most diverse and economically important groups of insects. By providing GABBI as an automated and open-access pipeline, we hope to open new probe design opportunities to other taxonomic groups that face similar phylogenetic obstacles.

9
Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv
Top 0.3%
0.3%
Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

10
A novel matrix multiplication framework for modeling genotype-by-environment interaction in genomic prediction

Montesinos-Lopez, O. A.; Montesinos-Lopez, A.; Montesinos-Lopez, J. C.; Crossa, J.; Dreisigacker, S.; Hernandez-Suarez, C. M.; Ortiz, R.

2026-05-15 genetics 10.64898/2026.05.11.724414 medRxiv
Top 0.3%
0.3%
Show abstract

Accurate modeling of genotype-by-environment (GxE) interaction is critical for genomic prediction in plant breeding but remains challenging due to complex interaction structures. Conventional models often use the Hadamard product of genotype and environment covariance matrices to capture joint similarity, which may not fully represent GxE complexity. Here we propose a novel framework that derives covariance structures from the matrix multiplication of genotype and environment kernels, decomposing these into symmetric components incorporated as random effects in mixed models. Evaluated for 11 wheat and rice multi-environment datasets and across, this approach consistently outperformed the traditional Hadamard-based model, improving prediction accuracy by up to 13.2% in Pearsons correlation and enhancing top-selection accuracy. Combining both methods yielded the highest performance, indicating complementary information capture. This framework offers a flexible, interpretable, and computationally feasible extension for modeling GxE interaction, potentially enhancing genomic selection effectiveness under diverse environmental conditions.

11
Individual natal assignment in highly migratory species: the genomic baseline and its application in loggerhead turtles

Luna-Ortiz, A.; Barbanti, A.; Pegueroles, C.; Abreu-Grobois, F. A.; Casale, P.; Freggi, D.; Giralt, S.; Labastida-Estrada, E.; Llera-Herrera, R.; Machkour-M'Rabet, S.; Marco, A.; Margaritoulis, D.; Turkozan, O.; Pascual Berniola, M.; Carreras, C.

2026-05-10 evolutionary biology 10.64898/2026.05.06.723276 medRxiv
Top 0.3%
0.3%
Show abstract

O_LIEffective conservation of highly migratory species requires understanding genetic structure across breeding populations and access high{square}resolution markers capable of assigning individuals from mixed aggregates (e.g. bycatch or new nesting sites) to their natal origins. Genomic approaches provide unprecedented resolution but add methodological challenges; thus, it is essential to first build a genomic baseline from known breeding areas and then evaluate strategies for assigning unknown individuals. C_LIO_LITo address this, we used 2b-RAD sequencing, a genomic reduction technique useful for degraded DNA, and loggerhead turtles as a case study. This species shows philopatric breeding, while juveniles and adults form mixed aggregations in foraging grounds. C_LIO_LIOur results highlight the importance of building baselines that include all potential source populations contributing to mixed aggregations. We detected hierarchical genetic differentiation and high resolution and successfully assigned the natal origin of 124 unknown individuals from four Mediterranean foraging grounds. These grounds showed distinct source contributions, and comparisons with previous studies suggest possible temporal shifts in stock composition. C_LIO_LIWe provide a comprehensive genomic baseline for individual assignment of Altanto-Mediterranean loggerhead turtles of unknown natal origin and a general framework for identifying population-specific threats in highly migratory species. C_LI

12
An exact formula for the contribution of sampling error to r2, a common measure of linkage disequilibrium

Waples, R. S.

2026-05-21 evolutionary biology 10.64898/2026.05.19.726388 medRxiv
Top 0.3%
0.3%
Show abstract

Interest in quantifying linkage disequilibrium (LD, non-random associations of alleles at different loci) has skyrocketed in recent years as researchers have focused on use of LD in genome-wide association studies (GWAS), for studying historical demography, and for estimating effective population size (Ne). The most widely used LD metric is r2 = the squared correlation of alleles at a pair of loci. Despite a half century of efforts, developing an unbiased expectation of r2 as a function of the many factors that can affect it (physical linkage, genetic drift, selection, migration, mutation, mating systems) remains elusive. Furthermore, even when all of these other factors are absent, empirical estimates of r2 are upwardly biased by sampling a finite number (S) of individuals, and that must be accounted for if one wants to focus on the desired signal of LD. Previous approaches to estimate [Formula] have been shown to be biased to greater or lesser degrees. The purpose of this short paper is to demonstrate that a simple and apparently exact expression for [Formula] does exist for the special case where sampling error is the only factor contributing to r2, in which case [Formula] = 1/(S - 1). When other factors contribute heavily to LD, [Formula] shrinks toward 0 as empirical r2 [-&gt;] 1. However, for estimating contemporary Ne with unlinked markers, empirical r2 will generally be small and 1/(S - 1) will provide a robust estimate of [Formula].

13
Rapid, Non-Destructive Visualization of α-Zein Expression and Grain Protein Concentration in Maize Using the Floury2-RFP Reporter Transgene

Li, C.; Heller, N. J.; Tiskevich, C. J.; Moose, S. P.

2026-05-07 plant biology 10.64898/2026.05.05.723001 medRxiv
Top 0.3%
0.3%
Show abstract

Kernel composition traits in maize, including protein accumulation, are of broad interest. The amount of the most abundant proteins in maize endosperm, the -zeins, can vary dramatically among genotypes and in response to soil nitrogen supply. Targeted reductions in -zein accumulation can improve nitrogen utilization and the nutritional quality of maize grain but have traditionally required expensive and destructive phenotyping methods. The Floury2-RFP (Fl2-RFP) reporter gene enables rapid, non-destructive visualization of -zein accumulation in individual maize kernels under white light. This feature is due to the high expression level programmed by the Fl2 promoter, the stability of zein proteins, and the use of monomeric RFP, which emits fluorescence without the need for multimerization. This study aimed to develop a method to quickly document and quantify Fl2-RFP accumulation using camera or smartphone images of either ears or shelled kernels. Results show images of shelled kernels processed with FIJI software capture the Fl2-RFP reporter phenotype better than images of ears. Fl2-RFP confirms the strong maternal control of -zein accumulation and, like grain protein concentration, responds to soil nitrogen supply. The Fl2-RFP phenotyping pipeline effectively quantified Fl2-RFP accumulation by color features from both camera and smartphone images. Smartphone imaging of Fl2-RFP in a diverse population of inbreds followed by elastic net regression of extracted image features predicted kernel protein concentration, as measured by near-infrared spectroscopy, with moderate accuracy (R2 = 0.68, MAE = 0.76, RMSE = 0.93). The spectral features that were most predictive of kernel protein concentration varied depending on whether the background endosperm color was white or yellow. The integrated analysis of Fl2-RFP intensity and grain protein concentration indicates genetic variation for kernel protein accumulation and N-responsiveness that is distinct from the well-studied -zeins. Our findings highlight the Fl2-RFP reporter gene as a valuable tool for investigating the genetic complexity of grain protein concentration and associated traits in maize.

14
Extremely low effective population size in a captive-bred population: partial mitigation through management practices

Lamarins, A.; Waples, R. S.; Piironen, J.; Primmer, C. R.

2026-05-12 evolutionary biology 10.64898/2026.05.12.724519 medRxiv
Top 0.4%
0.3%
Show abstract

1Effective population size (Ne) is a critical parameter for evaluating the evolutionary and persistence potential of endangered populations and for designing sustainable conservation strategies. Captive breeding and release programs are widely used across taxa to reduce risk of extinction when natural reproduction is insufficient or no longer possible, making it essential to assess their consequences. We used the case study of the landlocked Saimaa salmon (Salmo salar), one of the most critically en-dangered salmonid populations in Europe, with unique evolutionary significance due to its isolation from other populations since the last glaciation. Using long-term demographic data (1969-2024) from wild-caught founders of a captive breeding and release program, we estimated the effective population size under multiple scenarios of variance in reproductive success. Across scenarios, Ne ranged from 33 to 81 individuals, representing 32%-75% of the census size. Captive breeding practices aimed at equalizing parental contributions during fertilization and early life stages increased Ne by 12% compared to natural reproductive conditions. However, variation in survival after early developmental stages, typically beyond direct management control, remained a key determinant of Ne. Despite recent increases in the number of founders, the population remains genetically vulnerable due to historical bottlenecks. These results highlight that while captive breeding programs can partially mitigate genetic risks, their effectiveness depends critically on both controlled and uncontrolled sources of variance in reproductive success. Strengthening such programs may require combining breeding management with habitat restoration and, where appropriate, genetic rescue to ensure the long-term evolutionary potential of such unique and endangered populations.

15
When can whole-genome SNP heritability be reliably estimated from summary statistics?

Pham, B. K.; Davenport, S.; Azriel, D.; Schwartzman, A.

2026-05-16 genetics 10.64898/2026.05.13.724972 medRxiv
Top 0.4%
0.3%
Show abstract

LD Score Regression (LDSC) is a prominent method, which estimates whole-genome SNP heritability from summary statistics via the slope of a linear regression of GWAS test statistics corresponding to a trait of interest against LD scores. It was claimed by the LDSC authors that the free intercept in the regression accounts for confounding bias such as population stratification. In this study, we argue that the intercept in LDSC must be fixed to 1 for accurate SNP heritability estimation. We show both theoretically and with simulations that the estimated intercept does not accurately capture population stratification effects, and that it adversely affects the accuracy of the heritability estimate introducing bias and increasing variance. Fixing the intercept to 1 eliminates bias and reduces variance when no population stratification is present. On the other hand, under population stratification, LDSC is biased with both the free and the fixed intercept. Additionally, we show that estimated standard errors in LDSC are underestimated, potentially leading to false-positives in downstream GWAS analyses.

16
Development and validation of a multilocus sequence typing scheme for Fasciola hepatica using next-generation deep amplicon sequencing

Abbas, M.; kozel, K.; Daramola, O.; Selemetas, N.; Robinson, M. W.; Morgan, E. R.; Chaudhry, U.; Betson, M.

2026-05-22 genetics 10.64898/2026.05.20.726500 medRxiv
Top 0.4%
0.3%
Show abstract

Fasciolosis caused by Fasciola hepatica is an economically important disease in sheep and cattle. Knowledge of the population genetic structure of F. hepatica is important for understanding gene flow and informing disease control. In the present study, we designed, developed, and validated a multilocus sequence typing (MLST) scheme based on six markers. These markers were selected by aligning newly sequenced whole-genome sequence (WGS) data with available reference genomes and selecting variable regions with five or more single-nucleotide polymorphisms SNPs from different scaffolds of the F. hepatica reference genome Fasciola 10x pilon (GCA_900302435.1). Twenty markers were initially identified, of which 12 were multiplexed for deep amplicon sequencing after validation on worm and faecal eggs DNA; six markers were ultimately retained for downstream population genetics analysis. These markers were used to investigate population genetic structure in 15 cattle- and 27 sheep-derived F. hepatica populations in UK. A total of 53 unique alleles from six MLST markers were identified from 30 faecal (cattle = 13, sheep = 17) and 12 adult worm (cattle = 2, sheep = 10) populations. Shared alleles were observed in sheep- and cattle-derived populations. The highest allelic variation was observed in the Scottish Borders, Southern Scotland, and South-West England, and the lowest in North-West England. Minimal genetic differentiation was observed between cattle- and sheep-derived populations, with most genetic structuring within rather than between populations. Five markers showed high allelic polymorphism, whereas one marker showed low levels of allelic polymorphism, highlighting the importance of multilocus approaches. Overall, this six MLST-marker panel provides a tool for population genetic studies, revealing high gene flow and clonal expansion of F. hepatica across hosts and regions in the UK.

17
A songbird karyotype: cytogenetic confirmation of a migration-associated region rich in olfactory receptor genes.

Caballero Lopez, V.; Dedukh, D.; Ekman, D.; Kauzal, O.; Lundberg, M.; Odenthal-Hesse, L.; Proux-Wera, E.; Reifova, R.; Reif, J.; Altmanova, M.; Trifonov, V.; Bensch, S.

2026-05-07 genomics 10.64898/2026.05.04.721007 medRxiv
Top 0.4%
0.2%
Show abstract

The field of genetics of bird migration advances, driven by exponential refinements of sequencing and tracking technologies. In willow warblers (Phylloscopus trochilus), a complex repeat-rich region named MARB (Migration Associated Repeat Block) has recently been found to correlate with the routes taken by individual birds from Europe to their African wintering grounds. However, the genomic location of this region remains unknown. Here, we characterized MARB using a combination of approaches to understand how it evolved. We describe the region using long-read genome assemblies of two willow warbler subspecies (P. t. trochilus and P. t. acredula), two related species, the common chiffchaff (P. collybita) and the greenish warbler (P. trochiloides), and whole genome sequencing data from 76 willow warblers. Finally, we applied karyotyping and fluorescent in situ hybridization techniques on willow warbler spermatocytes to cytogenetically locate MARB. Due to the many repeats, we cannot order scaffolds in silico, but probe hybridization on the karyotype shows that MARB constitutes a single locus (~27.5 Mb) spanning most of the 11th largest chromosome in the willow warbler genome. Interestingly, the MARB regions of all species share several characteristics such as relatively high GC content (50%), a high density of specific repeat families and notably, more than 800 olfactory receptor sequences. Regions homologous to MARB may exist in several migrant bird genomes, though currently unassembled due to their complexity. Resolving these in species with similar migratory polymorphisms to willow warblers will be essential to determine whether MARB influences migratory behaviour across species.

18
Efficient Optimization of Genotype Pairs for Intercropping using Genomic Prediction and Bayesian Optimization

Kinoshita, S.; Iwata, H.

2026-05-18 genomics 10.64898/2026.05.15.725387 medRxiv
Top 0.5%
0.2%
Show abstract

Intercropping is a promising strategy to improve productivity and sustainability in agricultural systems, but designing effective genotype combinations remains a major challenge owing to the rapid increase in possible pairings as the number of candidate genotypes increases. This creates a practical bottleneck because field evaluation of all combinations is infeasible under realistic resource constraints. Here, we propose a framework that integrates genomic prediction and Bayesian optimization to support efficient decision-making for intercropping system design. Using genome-wide marker data from sorghum and soybean, we simulated intercropping performance across 5,214 genotype pairs under certain genetic architectures, including variation in heritability, correlations between direct and indirect genetic effects, and the contribution of pair-specific interactions. Genomic prediction models incorporating direct and indirect genetic effects substantially improved prediction accuracy compared with models based on direct genetic effects alone, and inclusion of specific mixing ability further enhanced the performance under high-heritability conditions. When coupled with Bayesian optimization, the models rapidly identified superior genotype pairs, requiring fewer evaluation cycles than random or prediction-only search strategies. Acquisition functions that account for predicted uncertainty were most effective in complex scenarios involving interaction effects or negative correlations between direct and indirect effects. These results demonstrate that combining genomic prediction with Bayesian optimization can substantially reduce the experimental burden associated with intercropping design, while improving the efficiency of identifying high-performing genotype pairs. The proposed framework provides a practical approach for prioritizing candidate mixtures in breeding and field evaluation, and contributes to the development of data-driven strategies for sustainable agricultural systems. HighlightsO_LIA data-driven framework was developed to optimize genotype pairs in intercropping. C_LIO_LIModeling indirect effects improved prediction accuracy across genotype pairs. C_LIO_LIPair-specific interactions enhanced prediction under high-heritability conditions. C_LIO_LIBayesian optimization identified superior pairs under limited evaluation capacity. C_LIO_LIThe framework reduces field-testing requirements for intercropping system design. C_LI

19
Gene model for the ortholog of Lst8 in Drosophila yakuba

Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.

2026-05-14 genomics 10.64898/2026.05.12.723325 medRxiv
Top 0.5%
0.2%
Show abstract

Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

20
The reliability and accuracy of recombination inferred by Shapeit2 duoHMM on whole genome sequence

Oubninte, S.; Ruczinski, I.; Yanek, L. R.; Mathias, R.; Bureau, A.

2026-05-10 genomics 10.64898/2026.05.06.723015 medRxiv
Top 0.5%
0.2%
Show abstract

Few studies assessed the performance of population-based phasing combined with parental genotypes to infer recombination on whole genome sequence (WGS) data. In this study, our objective was to evaluate whether Shapeit2 duoHMM, a Hidden Markov Model using parental genotypes, infers recombination events reliably on WGS and with narrower intervals than SNP arrays. We based our analysis on the overlap between recombination events inferred by Merlin on SNP genotypes and Shapeit2 on WGS and SNP genotypes. We used a sample of 61 extended families from the GeneSTAR study with TopMED freeze 8 WGS on 580 sequenced subjects (60% of sample). Shapeit2 was run with a window size of 500 kilobases and 200 states on WGS. To mimic a SNP array, we extracted genotypes of 355,112 autosomal markers on the Illumina OmniExpress array. The number of recombination events per meiosis inferred by Shapeit2 on the WGS data (36.8) was aligned with the expected numbers over autosomes (35.7), although Merlin overestimated this number (115.0). 73% of Shapeit2 recombination events on WGS were detected by Merlin, a proportion rising to 91% when restricting to events also inferred by Shapeit2 on OmniExpress genotypes. Furthermore, Shapeit2 recombination intervals were narrower on WGS than OmniExpress genotypes (median of 4,530 bp vs. 49,458 bp). This suggests that Shapeit2 on WGS is a reliable and accurate method for inferring recombination events.